LightDM failsafe handler and restart directives #340

wallentx · 2024-01-04T11:45:14Z

This PR is in reference to #337

In a few different scenarios, I've experienced situations where a child process spawned by LightDM (such as x11) fails to start. As per the instructions in lightdm.service, upon failure, LightDM will Restart=always.
The problem that can come from this, is when one of those services are failing for a very specific reason that requires the user to make changes. In a scenario where such a service will fail 100% of the time, this will cause LightDM to Restart=always 100% of the time, when will eventually trigger systemd's rate-limiting mechanism, resulting in logs that would appear like this:

systemd[1]: lightdm.service: Service hold-off time over, scheduling restart.
systemd[1]: Stopped LightDM.                        
systemd[1]: lightdm.service: Start request repeated too quickly.
systemd[1]: Failed to start LightDM.                
systemd[1]: lightdm.service: Unit entered failed state.       
systemd[1]: lightdm.service: Failed with result 'exit-code'.

The problem is that, depending on the failure, the user might only see a black screen, with no indication that anything is wrong, except that the screen is black, and nothing is loading. The user would only be able to discover details in the aforementioned logs if they were to switch to a different TTY, log in, and check their logs. From here, they would be able to discover why the failure happened, and fix the issue.

The changes that I am proposing are essentially a shortcut to what I just described.

The lightdm.service file now includes StartLimit IntervalSec and Burst values, a RestartSec rate, and an OnFailure action. The values I've proposed I think make sense:
The service has a restart delay of 5 seconds in between resets, and if the service restarts more than 5 times within a 60s period, then the service will be in a failed state. In the event of it being in a failed state, it triggers the OnFailure action:debian/lightdm-failure-handler.service

This service makes a copy of /etc/issue (the MOTD-like message you see above a TTY login) to /etc/issue.restore, and writes in a new message "hint" to /etc/issue that lets the user know what to do. It then changes the user to TTY4 (I only chose 4 because I've noticed others to be reserved for other things, or unavailable.. someone should check my reasoning here though) so that they can see that message, and log in to troubleshoot.

Upon successful login of LightDM, if an /etc/issue.restore file exists, it will replace /etc/issue (which presumably has the "hint" we wrote earlier). If no .restore file is found, no action is taken.

I couldn't find anything else in the repo specifically that referenced the packaging or manifest of the .service files, but maybe that is handled elsewhere.

github-actions · 2024-01-04T11:45:31Z

Everyone contributing to this PR have now signed the CLA. Thanks!

wallentx · 2024-01-04T11:46:59Z

Hey! wallentx has not signed the Canonical CLA which is required to get this contribution merged on this project.
Please head over to ubuntu.com/legal/contributors to read more about it.

I signed this, btw, but maybe my launchpad ID was supposed to be my launchpad account email?

robert-ancell · 2024-01-16T02:31:47Z

Can you please resolve the above conflicts? Thanks.

robert-ancell · 2024-03-20T01:31:46Z

Sorry, @wallentx needs a rebase again.

iphands · 2024-03-28T04:54:32Z

This is great @wallentx I want this too!

Its rather maddening when you get caught in a restart loop and as you are at some tty, the restart occurs and in the middle of typing you get flipped back to tty1 as things fail.

Having some back off / retry limit would be great ootb config!

…sn't exist

github-actions · 2024-04-06T05:32:00Z

Hey! wallentx has not signed the Canonical CLA which is required to get this contribution merged on this project.

Please head over to https://ubuntu.com/legal/contributors to read more about it.

wallentx · 2024-06-07T21:30:23Z

Hey @robert-ancell I've been held up on attempting to fix my Canonical CLA issue because I found a bug/exploit/bypass with the Canonical cla-check itself, and have a PR that shows it's existence, and also includes the fix canonical/has-signed-canonical-cla#38

If I fix my CLA, then the evidence of the flaw disappears. Do you know anyone who might be able to at least ack the details of my other PR so that I can be freed up to move forward here?

robert-ancell · 2024-06-10T00:26:52Z

@wallentx thanks for working on this! I've pinged the person who seems to be maintaining the CLA checker and hopefully they'll land that fix soon :)

wallentx mentioned this pull request Jan 6, 2024

Warn/disallow Restart=always without preventing repeated too quickly loop systemd/systemd#30804

Open

wallentx added 7 commits April 5, 2024 16:36

Update lightdm.service

08cac24

Create lightdm-failure-handler.service

9128c07

Create lightdm-success-handler.service

df35cf2

Update lightdm.service

a7d95cb

Update lightdm.service RestartSec

2a2a2f5

Update lightdm-success-handler.service in the event issue.restore doe…

e0483c1

…sn't exist

Only making success handler act if a restore file exists

0c3b360

wallentx force-pushed the wallentx/lightdm-failsafe-handler branch from 939c34d to 0c3b360 Compare April 6, 2024 05:31

wallentx mentioned this pull request Apr 6, 2024

Add handling for when author is null canonical/has-signed-canonical-cla#38

Open

wallentx mentioned this pull request Jun 7, 2024

Why not show an error if it fails winesapOS/winesapOS#830

Closed

5 tasks

robert-ancell merged commit f50ba25 into canonical:main Jun 10, 2024
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LightDM failsafe handler and restart directives #340

LightDM failsafe handler and restart directives #340

wallentx commented Jan 4, 2024

github-actions bot commented Jan 4, 2024 •

edited

Loading

wallentx commented Jan 4, 2024

robert-ancell commented Jan 16, 2024

robert-ancell commented Mar 20, 2024

iphands commented Mar 28, 2024

github-actions bot commented Apr 6, 2024

wallentx commented Jun 7, 2024

robert-ancell commented Jun 10, 2024

LightDM failsafe handler and restart directives #340

LightDM failsafe handler and restart directives #340

Conversation

wallentx commented Jan 4, 2024

github-actions bot commented Jan 4, 2024 • edited Loading

wallentx commented Jan 4, 2024

robert-ancell commented Jan 16, 2024

robert-ancell commented Mar 20, 2024

iphands commented Mar 28, 2024

github-actions bot commented Apr 6, 2024

wallentx commented Jun 7, 2024

robert-ancell commented Jun 10, 2024

github-actions bot commented Jan 4, 2024 •

edited

Loading